Towards Conceptual Indexing of the Blogosphere through Wikipedia Topic Hierarchy

نویسندگان

  • Mariko Kawaba
  • Daisuke Yokomoto
  • Hiroyuki Nakasaki
  • Takehito Utsuro
  • Tomohiro Fukuhara
چکیده

This paper studies the issue of conceptually indexing the blogosphere through the whole hierarchy of Wikipedia entries. About 300,000 Wikipedia entries are used for representing a hierarchy of topics. Based on the results of judging whether each blog feed is relevant to a given Wikipedia entry, this paper proposes how to judge whether there exist blog feeds to be linked from the given entry. In our experimental evaluation, we achieved over 90% precision in this task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Wikipedia Based Semantic Graph Model for Topic Tracking in Blogsphere

There are two key issues for information diffusion in blogosphere: (1) blog posts are usually short, noisy and contain multiple themes, (2) information diffusion through blogosphere is primarily driven by the “word-of-mouth” effect, thus making topics evolve very fast. This paper presents a novel topic tracking approach to deal with these issues by modeling a topic as a semantic graph, in which...

متن کامل

Predicting Central Topics in a Blog Corpus from a Networks Perspective

In today’s content-centric Internet, blogs are becoming increasingly popular and important from a data analysis perspective. According to Wikipedia, there were over 156 million public blogs on the Internet as of February 2011. Blogs are a reflection of our contemporary society. The contents of different blog posts are important from social, psychological, economical and political perspectives. ...

متن کامل

Topic Classification of Blog Posts Using Distant Supervision

Classifying blog posts by topics is useful for applications such as search and marketing. However, topic classification is time consuming and error prone, especially in an open domain such as the blogosphere. The state-of-the-art relies on supervised methods, requiring considerable training effort, that use the whole corpus vocabulary as features, demanding considerable memory to process. We sh...

متن کامل

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy

Automatic indexing is one of the important technologies used for Textual Data Analysis applications. Standard document indexing techniques usually identify the most relevant keywords in the documents. This paper presents an alternative approach that aims at performing document indexing by associating concepts with the document to index instead of extracting keywords out of it. The concepts are ...

متن کامل

Topic structure extraction for meeting indexing

This paper describes a system that automatically generates meeting minutes by extracting a topic hierarchy from a meeting’s speech. The topic hierarchy is a tree structure whose nodes comprise a topic summary. The topic structure extraction process converts speech recognition results into a word conceptual vector sequence and divides the sequence into the topic segments (topic segmentation). It...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009